Active Learning: A Visual Tour

Zeel B Patel, IIT Gandhinagar, patel_zeel@iitgn.ac.in

Nipun Batra, IIT Gandhinagar, nipun.batra@iitgn.ac.in

Rise of Supervised Learning¶

  • Machine learning has entered almost all the fields (Natural Language Processing (NLP), Computer-aided diagnosis, Optimization, and Bioinformatics)
  • Majority of this success goes to Supervised learning
  • Supervised learning needs labeled data

Data Annotation is Expensive¶

Speech Recognition¶

In [7]:
display(aud1)
Your browser does not support the audio element.

Human Activity Recognition¶

In [10]:
fig.show()

All the Samples are Not Equally Important¶

SVC Says: Closer is Better¶

In [13]:
fig.show()

Confusion in Digit Classification¶

In [15]:
fig.show()
In [19]:
fig.show()

GP Needs 'Good' Data Points¶

In [25]:
fig.show()
In [27]:
fig.show()

The Basics of Active Learning¶

In [29]:
fig.show()

Random Baseline¶

An ML model can randomly sample datapoints and send them to the oracle for labeling. Random sampling will also eventually result in capturing the global distribution of the dataset in the train datapoints. However, active learning aims to improve the model by intelligently selecting the datapoints for labeling. Thus, Random sampling is an appropriate baseline to compare with active learning.

Different Scenarios for Active Learning¶

  1. Membership Query Synthesis: model has an underlying distribution of data points from where it can generate the samples. The generated samples are sent to the oracle for labeling.
  2. Stream-Based Selective Sampling: We have a live stream of online data samples, and for each incoming sample model can choose to query for it or discard it based on some criteria.
  3. Pool-Based Sampling: In this case, we already have a pool of unlabeled samples (We called them potential train points in the prior discussion). Based on some criteria, model queries for a few samples.

Pool-Based Sampling¶

  1. Uncertainty Sampling: We query the samples based on the model's uncertainty about the predictions.
  2. Query by Committee: In this approach, we create a committee of two or more models. The Committee queries for the samples where predictions disagree the most among themselves.

Uncertainty Sampling¶

Digit Classification with MNIST Dataset¶

  1. Least confident: In this method, we choose samples for which the most probable class's probability is minimum.

  2. Margin sampling: In this method, we choose samples for which the difference between the probability of the most probable class and the second most probable class is minimum.

  3. Entropy: Entropy can be calculated for N number of classes using the following equation, where $P(x_i)$ is predicted probability for $i^{th}$ class. \begin{equation} H(X) = -\sum\limits_{i=0}^{N}P(x_i)log_2P(x_i) \end{equation}

In [32]:
fig.show()
In [37]:
anim
Out[37]:
In [39]:
fig.show()

Regression on Noisy Sine Curve¶

In [41]:
fig.show()
In [44]:
anim
Out[44]:

Query by Committee (QBC)¶

  1. Same model with different hyperparameters
  2. Same model with different segments of the dataset
  3. Different models with the same dataset

Classification on Iris Dataset¶

In [46]:
fig.show()
In [52]:
anim
Out[52]:

Separation boundaries between different colors are decision boundaries in Animation 3. Points queried by the committee are the points where the learners disagree the most. This can be observed from the above plot. We can see that initially, all models learn different decision boundaries for the same data. Iteratively they converge to a similar hypothesis and thus start learning similar decision boundaries.

We now show the comparison of the overall F1-score between random sampling and our model. QBC, most of the time, outperforms the random sampling method.

In [53]:
# fig, ax = plt.subplots(figsize=(12,4))
# ax.plot(range(1,1+len(rand_pred_all_iris)), overall_acc(rand_pred_all_iris), label='Random baseline', color=my_clr['l_b'])
# ax.plot(range(1,1+len(rand_pred_all_iris)), overall_acc(list_pred_all_iris), label='QBC',color=my_clr['l_r'])
# ax.legend();ax.set_xlabel('Iterations');ax.set_ylabel('Overall F1-score');
# ax.set_ylim(0,1)
# plt.figtext(0.2,-0.1,'Comparison between QBC and random baseline on Iris dataset',fontdict={'size':16});
# format_axes(ax);

# plt.xticks(np.arange(1,1+len(rand_pred_all_iris),2));

# Create traces
layout = Layout(
    paper_bgcolor='rgb(255,255,255)',
    plot_bgcolor='rgb(255,255,255)'
)
fig = go.Figure(layout=layout)
fig.add_trace(go.Scatter(x=list(range(1,1+len(list_pred_all_iris))), y=overall_acc(list_pred_all_iris),
                    mode='lines+markers',
                    name='Query by committee',
                    line=dict(width=2,color=px.colors.DEFAULT_PLOTLY_COLORS[1]), 
                    hovertemplate='(%{x:.2f},%{y:.2f})'))
fig.add_trace(go.Scatter(x=list(range(1,1+len(rand_pred_all_iris))), y=overall_acc(rand_pred_all_iris),
                    mode='lines+markers',
                    name='Random sampling',
                    line=dict(width=2,color=px.colors.DEFAULT_PLOTLY_COLORS[0]), 
                    hovertemplate='(%{x:.2f},%{y:.2f})'))

############# Common
fig.update_yaxes(automargin=True,gridcolor='rgba(128,128,128,0.2)',gridwidth=1,zerolinecolor='rgba(128,128,128,0.2)',zerolinewidth=1)
fig.update_xaxes(automargin=True,gridcolor='rgba(128,128,128,0.2)',
                 gridwidth=1,zerolinecolor='rgba(128,128,128,0.2)',
                 zerolinewidth=1,tickvals=list(range(1,31)))
fig.update(layout_coloraxis_showscale=False)
fig.update_layout(title_text='<b>Figure 14:</b> Comparison between QBC and Random baseline on Iris dataset',
                  title_x=0.5,
                 xaxis_title='Iterations',
                 yaxis_title='Overall F1-score',
                #font=dict(family="Courier New")
                 )
fig['layout']['xaxis'].update(side='bottom')
fig.show()

Comparison between Uncertainty sampling and QBC¶

  • For uncertainty sampling, we will use the Random Forest classifier.
  • For QBC, let us use three different classifiers (Random Forest Classifier, Logistic Regression , and Support Vector Classifier)
In [55]:
anim
Out[55]:
In [57]:
fig.show()

How many samples to query at once?¶

In [61]:
fig.show()

Few More Active Learning Strategies¶

  1. Expected model change: Selecting the samples that would have the most significant change in the model.
  2. Expected error reduction: Selecting the samples likely to reduce the generalization error of the model.
  3. Variance reduction: Selecting samples that may help reduce output variance.

Thank you¶